Data Structures
Objective #1: Use the Set interface and understand that it is implemented by the classes HashSet and TreeSet.
- A set is an unordered collection of distinct objects. There are no duplicates in a set.
- Examples of when you might use a set are for keeping track of registered user names, correctly spelled words in a spelling dictionary, etc.
- The java.util.Set<E> interface with the following methods will be used on the AP exam:
int size();
boolean add(E x);
boolean contains(Object x);
boolean remove(Object x);
Iterator<E> iterator();
- The elements in a Set are not kept in any specific order. There is no random access. An iterator must be used to traverse the elements in whatever order the given implementation establishes. The order that the elements are visited during a traversal is not necessarily the same order that they were inputed into the set. Note that a ListIterator is not used with sets since the ListIterator methods add, set, and previous and those methods have to do with random access and/or iterating over elements in a particular order.
- The java.util.HashSet<E> class implements Set<E>.
- It uses a hash table (discussed below) to determine what makes every element of a set unique.
- A hash set does not assume that its elements are Comparable. It uses a hashCode method and the equals method
to determine if two objects are equal and to prevent duplicates from being inputed into a HashSet.
- The add, remove, contains, and size methods of a hash set have a time efficiency of O(1) in the best case. In the worst case, add, remove, and contains are O(n).
- The statement System.out.println(myHashSet); can be used to display the elements of myHashSet in whatever order the iterator feels like using.
- To instantiate a HashSet, you should associate a type of object with the HashSet by typing the name of the type in angle brackets directly after the reference to HashSet as in
HashSet<String> names = new HashSet<String>();
You can also instantiate a HashSet as a reference to a Set with the statement
Set<String> names = new HashSet<String>();
- To add an object reference to your HashSet, you can use the statement names.add("Minich");
- The java.util.TreeSet<E> class implements Set<E>.
- It uses a balanced binary search tree (BST) to determine what makes every element of a Set unique.
- Elements placed into a TreeSet are assumed to be Comparable so the compareTo method is used to determine uniqueness and order.
- An iterator returns a TreeSet's values in ascending order.
- The add, remove, and contains methods of a TreeSet have a time efficiency of O(log n) in the best case. In the worst case, add, remove, and contains are O(n) if the binary tree is not balanced.
- Elements accessed via a TreeSet iterator will be returned in ascending order using the elements' compareTo method.
- A TreeSet's least element is returned by the first call of an iterator's next method.
- The statement System.out.println(myTree); can be used to display the elements of the TreeSet myTree in ascending order.
- If you need to keep the elements of a Set ordered, use a TreeSet. Otherwise, use a HashSet since it works faster. For example, if you know that you will need to print out all of the values of a set in sorted order from time to time in addition to searching for the existence of a value in the sort, you should use a TreeSet. If you will only need to look up whether a value is in a set or not, you should probably use a HashSet.
- After creating an iterator for a HashSet or TreeSet, an error occurs if you attempt to modify the set with any method other than the iterator's remove method.
- Typical algorithms or methods that you may have to write involving sets are:
- finding the union of two sets
- finding the intersection of two sets
- finding the difference of two sets (i.e. the elements of one set that are not in the other set)
- testing whether one set is a subset of another set
Objective #2: Use the Map interface and understand that it is implemented by the classes HashMap and TreeMap.
- A map is a data type that keeps associations between keys and values. A map holds key-value pairs of objects. Each key has a unique value. The keys form a Set. One value is associated with each key but different keys can be associated with the same value. The keys and the values must be objects rather than primitive data types.
- Examples of when you would use a map are a phone book (names & phone numbers), file management on a computer (file names and hard drive memory addresses), logging on to a computer (login names and passwords), etc.
- The java.util.Map<K,V> interface with the following methods will be used on the AP exam:
int size();
returns the number of key/value mappings in the map
V put(K key, V value);
The put method associates key with value and inserts the pair into the map. If the map already contains a mapping for this key, the old value is replaced and the previous value is returned. However, null is returned if there was no previous value for this key.
V get(Object key);
returns the value associated with key. The method returns null if no mapping exists for key or if null is explicitly mapped to key.
boolean containsKey(Object key);
returns true if the map contains a mapping for key; false otherwise
Set<K> keySet();
returns the whole set of keys contained in the map.
- You can print a map using System.out.println(myMap); This statement would print all of the key/value pairs in myMap.
- Two classes will be used on the AP exam that implement Map. The HashMap class uses a hash table for the keys. The TreeMap class uses a binary search tree to store the keys.
- HashMap
- The elements in a HashMap are unordered. A hash table is used to store the elements of a HashMap. The put, get, containsKey, and size methods are O(1) assuming that the keys are uniformly distributed in the hash table.
- Here is a code segment that instantiates a HashMap that contains a Set of String's as keys and each key has a String as a matching value
HashMap<String, String> dict = new HashMap<String, String>();
- The following statements would be used to add several key-value pairs to the HashMap
dict.put("fiesta", "holiday");
dict.put("uno", "one");
dict.put("siesta", "rest");
dict.put("si", "yes");
- The following statement would print the value 4 since there are 4 key-value pairs in the HashMap
System.out.println(dict.size());
- The following statement would create a HashMap that is a Set of Set's. Each "inner" set is a set of String's.
HashMap<String, Set<String>> dict = new HashMap<String, Set<String>>();
- To add a key-value pair to the HashMap above, you would need to create a Set as in
HashSet<String> translation1 = new HashSet<String>();
translation1.add("holiday");
translation1.add("party");
translation1.add("celebration");
translation1.add("feast");
Then, to add the Set to the HashMap you use the statement dict.put("fiesta", translation1);
- To add an object reference to a value that is already linked to a map, you can use
dict.get("fiesta").add("social");
- TreeMap
- The keys of a TreeMap are ordered but not necessarily in the same order that they were initially added to the TreeMap. The elements in a TreeMap must implement the Comparable interface.
- A balanced binary tree is used to store the elements in a TreeMap.
- The containsKey, get, and put methods are O(log n).
- If you need to keep the key elements of a Map ordered, use a TreeMap. Otherwise, use a HashMap since it allows for faster searches.
- To iterate over a map, you first retrieve the set of all its keys and then iterate over that set with an Iterator. You will probably not have to iterate over a map on the AP exam but here is one way to do so. First retrieve a set of keys using the keySet method. Then use the Set interface's iterator method to obtain an Iterator. But remember that, besides using the remove method, no outside modification with external methods is allowed during an iteration.
for (Iterator iter = map.keySet().iterator(); i.hasNext();)
System.out.println(iter.next());
Objective #3: Understand hash tables and hash functions and how they are used in the implementation of HashSet and HashMap.
- Hashing is a way of storing info that makes it very efficient and fast to look up the info.
- A hash table is used to store the data by way of a hash function.
- A hash function turns a value into an index position for the hash table which is usually an array. For example, if you need to store people's last names into a hash table, your hash function may simply add up all of the Unicode values of the characters in each name. A name is then stored at the position of the hash table that is equal to the sum of the Unicode values.
- Every object in Java inherits the Object class' hashCode method but it is wise to override that method for a class
if you wish to store its objects in a HashSet or HashMap.
In order for a HashSet and HashMap to work quickly and efficiently, the hashCode method must be good. And, to resolve collisions the equals method
(inherited & overrided from the Object class) must work correctly. That is if x.equals(y), then x.hashCode() == y.hashCode() . The equals and hashCode methods need to be consistent in the sense
that if two methods are equals then they must return the same
hashCode. The reason for this is when the object is used as the key in
a hashMap or hashSet, its bucket (i.e. rough location) is determined
by the hashCode, but when looking through the bucket on a get
invocation, equals is used to see if the key is present. Furthermore, the hashCode method must be
fixed for any given run of the program. This means that you should not
base the value of hashCode on any mutable state of the object. Suppose you use the object as a key in a HashMap (putting it into
a bucket), and then invoke a mutator that changes the hashCode. When
you try to later get that key, the HashMap will be looking in the
wrong bucket and won't find the object.
- The best size for a hash table is a prime number larger than the number of elements that your hash table is expected to hold. Having about 30% of the hash table's array positions remain empty is optimal.
- There are two constructors for the HashSet class. The default constructor constructs an empty hash table with default initial capacity. When the table gets too full, a new table that is twice the size of the original is created and all of the existing elements are copied over. The constructor that takes an int parameter creates an empty hash table with the number of positions equal to the constructor's parameter.
- Usually the time to look up a value in a hash table is O(1), assuming that there is only 1 element per bucket. Inserting a value into a hash table is also O(1). In the worst case a look up and a hash table insertion would degrade to O(n).
- There is a problem that occurs with hash tables however. In a hash table, two or more values may map to the same location. When this happens, it is called a collision.
- Chaining is one way to handle collisions. Chaining involves placing a second value into a bucket (which could be a LinkedList). Think of a hash table that consists of an array with a linked list at each position of the array. If the hash function sends two or more elements to a position of the array, they simply link to each other as nodes in that array position's linked list. Sometimes arrays or even binary search trees are used as buckets.
- Probing is another way to handle collisions. When a value collides with one that is already placed in a slot of the hash table, a probing function is used to convert that index position into another index position and so on until a vacant slot is found. A real simple probing function is to add one to the index position. This is called linear probing. Linear probing often results in clustering with many values clumping in one area of the hash table. Another probing function is to use try the index positions index + 1, index + 4, index + 9, etc. This is called quadratic probing.
- A good hash strategy is one that uses a good hash function and that handles collisions well. A good hash function is easy to calculate and naturally spreads out the values onto the given range (i.e. length of the array) and minimizes collisions.
- The java.util.HashSet class, which implements the Set interface of course, uses a hashing strategy for storing its values. The values do not have to be Comparable! It is only necessary that the values be hashed by a hash function. When the HashSet's add or contains methods execute, the hashCode method is called.
- The java.util.HashMap class, which implements the Map interface of course, uses a hashing strategy for storing its keys. The keys do not have to be Comparable. However, the hashCode method for the key objects will execute when the HashMap's put and containsKey methods are called.
- A lookup table is another way of providing fast search and retrieval times. For example, if you wanted to look up the name of the town where a post office of any given zip code is located, you could create an array of String's with a length of 100000. The name of any given town (e.g. "Wyomissing") would be stored in its corresponding position of the look-up table (e.g. 19610 in the case of Wyomissing). The O(1) time for searching information in a look-up table is O(1). A lookup table typically uses a one or two-dimensional array and uses a lot more space than a hash table. But, in a lookup table, each value maps to one and only one location of the array so collisions do not occur.
Objective #4: Trace & write code that uses binary tree-related algorithms.
- A tree is a collection of nodes and edges. A node is an object with data and pointers to its child nodes.
- A binary tree is a tree in which no node has more than 2 child nodes. A non-empty tree's root node has no parent node. A leaf is a node with no child nodes.
- The height (aka depth or level) of a non-empty tree is the number of nodes in the longest path from the root to a leaf. The height of an empty tree is defined to be zero although some authors consider the height of an empty tree to be -1. The height of a tree with only one node is one although some authors consider the height of a tree with only one node to be 0.
- A node that is in the same path as a node but that is located closer to the root is called an ancestor of the node. The node itself is a descendant of its ancestor node. The level of a node is equal to its depth. Nodes on the same level are called siblings.
- A tree is considered to be full if all of its levels are filled with nodes. A full tree with h levels has 2 ^ h - 1 nodes. Although for authors who consider a tree with one node to have a height of 0 rather than 1, a full tree with h levels has 2 ^ (h + 1) - 1 nodes. Each level in a full tree has twice as many nodes as the preceding level.
- A tree is complete if it has no gaps of nodes on any level except the last level where there can be missing nodes on the right. Every non-leaf node in a complete
tree must have two children. A balanced tree
has approximately the same number of nodes in the left and right subtrees at each level.
- A node is sometimes referred to as an internal node. The internal path length of a tree is the sum of the depths of all the internal nodes.
- An external node is a position where there could be a leaf. The external path length of a tree is the sum of the depths of all the external nodes of a tree. Though for authors who consider the height of an empty tree to be -1, this is defined as the sum of the depths of all leafs in the tree.
- The are four kinds of trees that will be discussed in this course:
- general binary trees (aka b-trees)
- binary search trees (aka BST)
- heaps - trees which fulfill the order and shape properties. In a min-heap, the root node is the least element.
- binary expression trees - contain operands and operators.
- The following TreeNode class will be used on the AP exam:
public class TreeNode
{
public TreeNode(Object initValue, TreeNode initLeft, TreeNode initRight)
{
value = initValue;
left = initLeft;
right = initRight;
}
public Object getValue()
{
return value;
}
public TreeNode getLeft()
{
return left;
}
public TreeNode getRight()
{
return right;
}
public void setValue(Object newValue)
{
value = newValue;
}
public setLeft(TreeNode newLeft)
{
left = newLeft;
}
public setRight(TreeNode newRight)
{
right = newRight;
}
private Object value;
private TreeNode left;
private TreeNode right;
}
Primitive types including int and double cannot be stored in TreeNode objects. The wrapper classes Integer and Double must be used.
- You must be able to trace & write recursive methods that can be used with binary trees since binary trees are recursive data structures that lend themselves to recursive algorithms. Often, these methods are use private helper methods that allows the root TreeNode object to be "hidden" behind the TreeSet (or BinarySearchTree) class.
- A preorder traversal of a binary tree involves visiting the root, the left subtree, and then the right subtree. If a binary expression tree is preorder traversed, the result is a prefix notation.
- An inorder traversal of a binary tree involves visiting the left subtree, the root, and then the right subtree.
- A postorder traversal of a binary tree involves visiting the left subtree, the right subtree, and then the root. If a binary expression tree is postorder traversed, the result is a postfix notation.
- A level order traversal (aka row order or breadth-first-order) is one which visits the nodes level by level starting with the root. The nodes on a given level are visited from left to right. This algorithm is not recursive. It uses a queue to control the output of the nodes. Here's the pseudocode for a level order traversal
- insert the root node in a queue
- while there are nodes left in the queue, do the following
- get the next node in the queue
- process the node (e.g. print its value)
- if the left child of the node is not null then insert the left child in the queue
- if the right child of the node is not null, then insert the right child in the queue
- A few other types of traversals that are unlikely to show up on the AP exam include:
- reverse inorder: right subtree, root, left subtree
- reverse preorder: root, right subtree, left substree
- reverse postorder: right subtree, left subtree, root
- Any method that must change the structure of a tree must be passed the TreeSet itself. Passing a head TreeNode cannot bring about a change to the structure of the tree. Such methods must return a TreeNode. The recursion should be performed in a helper method.
- Methods whose purpose to find a key value or compute a value (e.g. depth of a tree) can be passed a head TreeNode. They also do not require helper methods but sometimes are implemented with a private helper method to hide the implementation details.
- Some common operations on a binary tree for which you can write methods:
findInBinaryTree - find the largest or smallest value in a binary tree
findInBinarySearchTree - find the largest or smallest value in a binary search tree
treeSum - returns the number of internal nodes in a binary tree
depth - returns the depth (or height) of a binary tree
countNodes - counts the number of nodes in a tree
countLeafs - counts the number of leaf nodes in a tree
countExternalNodes - returns the number of external nodes
isBinarySearchTree - returns true if a tree is a binary search tree
computeSum - returns the sum of all of the Integer or Double values in a tree
leafSum - returns the sum of the Integer or Double values found in all of the leafs in a tree
internalPathLength - returns the internal path length of a tree
externalPathLength - returns the external path length of a tree
isFull - determines if a tree is full
areSimilar - returns true if two trees are similar. Trees are similar if they have the same exact shape and structure.
twenty-questions game - classic game in which the player replies yes or no to each question. The computer eventually guesses the player's secret object.
isLeaf - non-recursive method that returns true if its TreeNode parameter is a leaf, false otherwise
- The algorithm for searching a BST is similar to a "divide & conquer" binary search algorithm that is applied to an array or ArrayList.
- On the AP exam, you must know how to perform insertions, deletions, traversals, and work with iterators with BST's. You must be able to print the values stored in a BST using inorder,
postorder, or preorder traversals. You must also be able to search a BST for a given value.
Objective #5 : Use binary search trees and understand how binary search trees are used in the implementation of TreeSet and TreeMap classes.
- A binary search tree (BST) is a special kind of binary tree in which the nodes are ordered with every node in the left subtree is less than or equal to its parent node and every node in the right subtree is greater than its parent node. It also can be defined as every left node being less than or equal to its ancestor nodes and every right node being greater than its ancestor nodes.
- It is very efficient to insert, delete, and search for values in a BST. The average time is O(log n). The worst case time is O(n) in the case where the BST is not very balanced. In this worst case, an imbalanced BST would be a linked list.
- The java.util.TreeSet class is implemented as a binary search tree.
- Therefore, TreeSet guarantees reasonable search performance of O(log n) and allows its elements to be visited or printed in order. It also guarantees
that adding and removing an element takes O(log n) time.
- Since the TreeSet class implements the Set interface, it cannot have duplicate items. However, a general binary
search class could be allowed to contain duplicates. A TreeSet requires its elements to be comparable.
Objective #6: Use heaps.
- A heap is a special kind of binary tree that has the order (aka ordering or heap) property & the shape (aka completeness) property.
- A binary tree has the order property if, for every node, the value stored at that node is smaller than all of the values in its subtrees. Therefore, the smallest node in the tree will be at the root.
- A binary tree has the shape property and is considered to be almost complete if every leaf is at depth d or d-1 and all leaves at depth d are to the left of all leaves at depth d-1 and there is at most one node that has only one child (which must be a left child and the rightmost leaf at depth d.) All of the levels of a binary tree must be complete except for the last level.
- Inserting values into a heap is O(log n) as is removing and returning the smallest value in a heap. The process of "reheaping it down" and "reheaping it up is used to preserve the order and shape properties.
- A heap can be implemented using an array. Index position 0 of the array is kept empty. The root of the heap is stored in position 1 of the array. The children of the root are stored in index positions 2 and 3. For every other element in position i, the left child of the element is stored in position 2 * i and the right child in 2 * i + 1.
- One role of a heap in computer science is to use it for an efficient sorting algorithm. The heapsort algorithm inserts elements into a heap and then retrieve them from the heap in order to have them in sorted order. This is a fast, recursive, divide-and-conquer sorting algorithm that sorts in O(n log n) time.
Objective #7: Use priority queues.
- A priority queue is an abstract data type (like stacks and queues) that stores objects that each have a different priority. Unlike a normal queue, a priority queue is not FIFO. Instead, elements are retrieved according to their priority. For example, a collection of work requests would best be stored as a priority queue. A high school principal's work request to repair his office chair would have a higher priority than a teacher's work request to fix her classroom pencil sharpener. A superintendent's work request for a new light bulb would have a higher priority than the principal and the teacher's requests. The objects must implement Comparable since they need to be ordered by priority. The actual priority value might be an integer priority of each object that is stored in the priority queue with the lowest int value having the highest priority. Usually an integer is used to denote an object's priority with lower numbers indicating a higher priority.
- A heap is usually used to implement a priority queue. If higher priorities are indicated by lower integer values then a min-heap would be used to prioritize the collection of objects. A min-heap is organized with the lowest value in the root of the binary tree. Although, a max-heap could be used with higher values representing higher priorities.
- The AP exam will use a java.util.PriorityQueue<E> class that includes the following methods:
boolean isEmpty();
boolean add(E x);
E remove();
E peek();
- The isEmpty method returns true if the priority queue is empty; false otherwise.
- The add method adds the parameter x to the priority queue.
- The remove method removes the smallest object (i.e. the object with the highest priority) from the priority queue. This depends on the objects to be Comparable. An unchecked exception is thrown if an attempt is made to remove an item from an empty priority queue.
- The peek method returns the smallest object in the priority queue but does not remove the object. Again, an unchecked exception is thrown if an attempt is made to peek an item from an empty priority queue.
- The AP Java subset does not specify how PriorityQueue is implemented. The following concrete data structure implementations could be used for a priority queue:
- A min-heap (aka minimum heap) is probably the most efficient way to implement a priority queue. When a binary heap is used, each child node is greater than its parent node. Removing the minimum element from a min-heap is O(1) since the element will be the root of the tree. But this action forces you to reheap the heap and reheaping is O(log n). Adding a new node also requires reheaping so it is O(log n) as well.
- An ArrayList or array with elements in random order. Adding a node to the end of the array or ArrayList is O(1). Removing the minimum node (removeMin) would require a sequential search in O(n).
- An ArrayList or array with elements sorted by priority, that is the smallest elements at the end. Removing the minimum node requires removing the last element and occurs in O(1). Adding a new element requires a sequential or binary search and possibly requires creating a new slot for the new element. Thus the Big Oh time for add is O(n) even if a binary search is used.
- A LinkedList with elements stored in random order. Adding a node is done at the front of the LinkedList with O(1) insertion. Removing the minimum node (i.e. highest priority) requires a sequential search for the element with the highest priority yielding O(n).
- A LinkedList with elements sorted by priority, that is the smallest elements in front. Removing the smallest node is simply the removal of the first node and occurs in O(1). However, adding a node requires a sequential search in order to maintain sorted order and therefore requires O(n).
- When a heap is used to implement a PriorityQueue, add and remove are O(log n)
and isEmpty and peek are
O(1). Searching for an item is O(log n).
Objective #8: Measure these advanced data structures in terms of Big Oh time efficiency.
- linked list - searching for a particular element is O(n) since the search must be done sequentially
- ordered array or an ordered ArrayList- ....O(log n)....since a binary search can be used
- HashSet - ...hash table yields O(1)...
- TreeSet - Since a TreeSet is implemented as a balanced BST, the add, contains, & remove methods are O(log n) in worst case.
- TreeMap - Since a TreeMap is implemented as a balanced BST, the add, contains, remove, put, get, & containsKey methods are O(log n) in worst case.
- binary search tree
- insertion
- O(log n) is the average for balanced trees
- O(n) is the worst case for unbalanced trees.
- search/find
- O(log n) is the average for balanced trees.
- O(n) is the worst case for unbalanced trees.
creating a BST
- O(n log n) is the best case if elements are inputed in random order leading to a balanced tree with tree depth of log n
- O(n ^ 2) is the worst case if the elements are inputed in sorted or reverse order